This report explores whether short-term volatility forecasts can improve quote placement for high-frequency trading (HFT) firms and how well single-stock models generalise across correlated assets. We developed a two-pipeline system: (1) a volatility forecasting model using LSTM, and (2) a quoting strategy model using XGBoost, informed by predicted volatility and order book features. The LSTM model outperformed traditional baselines (WLS, GARCH), reducing RMSE by up to 36% and QLIKE by over 60%, though it underreacted to volatility spikes. The quoting model achieved a ~46% hit ratio but offered limited pricing advantage. Our findings highlight the potential of volatility-informed quoting, while also revealing challenges in responsiveness, mid-price modelling, and generalisation. Final models are deployed in a dashboard to assist market makers with risk-aware quote placement.
Background
Problem Context and Motivations
Market makers profit off the bid-ask spread, the discrepancy between the highest price a buyer is willing to pay and the lowest price a seller is willing to accept (O’Hara (1995)). Volatility, which measures price fluctuations in financial markets, introduces both risk and opportunity to market makers (Hasbrouck (2006)). Low volatility indicates stable price movements, tighter bid–ask spreads, and profits made through high trade volumes. In contrast, during high volatility, price fluctuations heighten—creating uncertainty and risk, thus spreads widen to insure against potential losses. Bollerslev and Melvin (1994) stated that there is a strong positive relationship between volatility and spreads; increased volatility evidently widens bid–ask spreads, with highly statistically significant coefficients linking conditional variance to spread levels.
For trading firms like Optiver, accurately forecasting short-term volatility is crucial for setting competitive quotes and managing execution risk, especially in HFT and options markets (Optiver (2021)). Motivated by this, our study focuses on leveraging predicted short-term volatility to optimise quoting strategies based on bid-ask spread. This study also considers the effects of inter-stock correlation on model performance by training a model on one stock and testing it on both a highly correlated and uncorrelated stock. The aim being to see if information about one stock can be used to improve and or make predictions about another.
This context leads to the main research question of this study:
How can short-term volatility forecasts be leveraged to optimise bid-ask spread quoting strategies—balancing execution risk and market competitiveness—for HFT firms? Additionally, how does inter-stock correlation influence prediction accuracy and quoting effectiveness when models are applied to unseen stocks?
Prior Work and Relevance
Nelson, Pereira, and Oliveira (2017) demonstrated that Long Short-Term Memory (LSTM) networks can be effectively applied to financial time series forecasting, achieving an average accuracy of 55.9% in predicting short-term stock price movements. Recent work by X. Zhang et al. (2019) introduced the attention-enhanced AT-LSTM model, which significantly outperformed both traditional ARIMA and standard LSTM models in forecasting financial time series. The attention mechanism dynamically assigns weights to different time steps, helping the model focus on the most relevant historical information for improved prediction accuracy. Their results showed that AT-LSTM achieved the lowest Mean Absolute Percentage Error (MAPE) values across multiple indices (including the Russell 2000, DJIA, and Nasdaq), consistently outperforming ARIMA.
These studies highlight the suitability of LSTM-based approaches for high-frequency volatility forecasting in our context.
Dataset
Code
DATA_FOLDER ="data"FEATURE_FILE ="order_book_feature.parquet"TARGET_FILE ="order_book_target.parquet"# Primary stock ID for model trainingMODEL_STOCK_ID =50200# Number of time_ids to use for trainingMODEL_TIMEID_COUNT =50# Other stocks for cross-stock performance comparisonCROSS_STOCK_IDS = [22753, 104919]# Number of time_ids per stock for comparisonCROSS_TIMEID_COUNT =10feature_path = os.path.join(DATA_FOLDER, FEATURE_FILE)target_path = os.path.join(DATA_FOLDER, TARGET_FILE)df_features = pd.read_parquet(feature_path, engine="pyarrow")df_target = pd.read_parquet(target_path, engine="pyarrow")# Concatenate feature and target, then sortdf_all = ( pd.concat([df_features, df_target], axis=0) .sort_values(by=["stock_id", "time_id", "seconds_in_bucket"]) .reset_index(drop=True))# Prepare main-stock training datasetdf_main_raw = df_all[df_all["stock_id"] == MODEL_STOCK_ID].copy()main_time_ids = df_main_raw["time_id"].unique()[:MODEL_TIMEID_COUNT]# df_main_train: training feature set for the primary stock (50 time_ids)df_main_train = ( df_main_raw[df_main_raw["time_id"].isin(main_time_ids)] .pipe(util.create_snapshot_features) .reset_index(drop=True))unique_time_ids = df_main_raw["time_id"].unique()test_time_ids = unique_time_ids[MODEL_TIMEID_COUNT : MODEL_TIMEID_COUNT +10]# df_main_test: test feature set for the primary stock (next 10 time_ids)df_main_test = ( df_main_raw[df_main_raw["time_id"].isin(test_time_ids)] .pipe(util.create_snapshot_features) .reset_index(drop=True))# Prepare cross-stock comparison datasetsdf_cross_features = {}for stock_id in CROSS_STOCK_IDS: df_stock_raw = df_all[df_all["stock_id"] == stock_id].copy() time_ids_cross = df_stock_raw["time_id"].unique()[:CROSS_TIMEID_COUNT] df_stock_feat = ( df_stock_raw[df_stock_raw["time_id"].isin(time_ids_cross)] .pipe(util.create_snapshot_features) .reset_index(drop=True) )# df_cross_features: dict of feature sets for each comparison stock (10 time_ids) df_cross_features[stock_id] = df_stock_feat
This study uses the Optiver Additional Dataset, which contains sequential ultra-high-frequency limit order book (LOB) snapshots for multiple stocks, structured into hourly trading windows. Specifically, order_book_feature.parquet includes 17.6 million rows from the first 30 minutes of each trading hour, and order_book_target.parquet includes 17.9 million rows from the last 30 minutes. Each row is indexed by stock_id, time_id, and seconds_in_bucket (0–3599), together defining a specific stock-hour snapshot.
The feature and target datasets were concatenated and sorted by stock_id, time_id, and seconds_in_bucket to reconstruct complete 1-hour trading periods. For modelling, we focus on a single primary stock (stock_id = 50200) for training and testing, and two additional stocks (stock_id = 22753 and 104919) for cross-stock generalisation analysis.
Methodology
Our overall methodology consists of two main pipelines: the first focuses on forecasting short-term volatility, and the second uses these forecasts to inform quoting strategies (see Figure 1).
Code
from IPython.display import ImageImage(filename='resources/figure1.png')
Figure 1: Schematic workflow overview.
Feature Engineering and Data Preparation
Feature engineering was applied to the reconstructed dataset to generate meaningful variables capturing market dynamics. For the volatility forecasting pipeline, engineered features include:
Mid price: average of bid and ask prices
Bid-ask spread: difference between the lowest ask price and the highest bid price
Weighted average price
Spread percentage
Order book imbalance
Depth ratio
Log return and log WAP change
Rolling standard deviation of log returns
Spread z-score
Volume imbalance
For the volatility-informed quoting strategy pipeline, the key input feature is the predicted short-term volatility (predicted_volatility_lead1) from the final LSTM model, combined with order book-based features.
Detailed feature definitions and formulas are provided in the Appendix A: Feature Definitions.
To capture short-term volatility dynamics, we used a rolling window approach, segmenting the high-frequency order book data into overlapping samples. We evaluated different configurations—window size (W), forecast horizon (H), and step size (S)—using a Random Forest baseline to identify the best balance between prediction accuracy (MSE) and sample richness. Results showed a U-shaped error curve, with overly large or small windows hurting performance. We selected W=330s, H=10s, and S=5s as the optimal configuration for all model development.
The first pipeline focuses on generating accurate short-term volatility predictions, a crucial capability for HFT firms to optimise options pricing and quoting strategies. To this end, we developed a robust model to capture the complex volatility behaviour of the selected stock.
Our models were trained using sequential order book data enhanced with engineered features. Due to computational constraints, we initially selected 50 consecutive time_ids from stock 50200 as the training set for testing—using an 80/20 chronological train-test split to maintain temporal consistency.
We trialled three initial candidates for volatility prediction: HAV-RV (with weighted least squares as a linear baseline), GARCH, and Long Short-Term Memory (LSTM) networks. The optimal GARCH hyperparameters (p=1, q=2) were determined by grid search, minimising RMSE. LSTM networks, though less interpretable due to their gated architecture, demonstrated superior capability in capturing long-term dependencies and nonlinear volatility patterns (Srivatsavaya (2023)). We iteratively tuned the LSTM model’s architecture and hyperparameters to achieve the best trade-off between complexity and performance.
The final volatility forecasting model is a double-layer bidirectional LSTM architecture integrating a Mixture-of-Experts (MoE) approach. It consists of two stacked bidirectional LSTM layers (128 hidden units each) and a dense fusion layer (64 ReLU units). The MoE head features two expert outputs—one capturing normal market behaviour and another focused on spikes—combined via a learned spike probability gating mechanism. To effectively balance precision and stability during training, we employed a two-stage loss strategy: an initial weighted MSE loss focusing on volatility magnitude, followed by a combined log-cosh and focal BCE loss to refine spike detection and robustness. This dual-headed design allows the model to dynamically adapt to different volatility regimes, enhancing both accuracy and interpretability.
To prevent data leakage, we ensured that the 80/20 train-test split strictly adhered to non-overlapping time_ids. Additional safeguards included separate data generators for training and validation, feature normalisation within each rolling window, and clipping outlier values to stabilise predictions.
Model selection was based on RMSE performance, with further discussion and detailed evaluation metrics provided in the following sections. This thorough approach ensures that the volatility forecasts feeding into our quoting strategy are accurate, robust, and well-suited to real-world trading environments.
This quoting strategy model uses the predicted volatility from the Volatility Forecasting model and current order book signals to generate quoting strategies that adapt to market conditions. This enhances interpretability and helps market makers adjust bid-ask spreads for high-frequency trading (HFT).
Key features include the predicted short-term volatility (predicted_volatility_lead1) and engineered order book features: spread_pct_scaled, wap, imbalance, depth_ratio, log_return, and bid_ask_spread. A rolling window approach with a 330-bucket segment and a 10-bucket stride was used, shifting the bid-ask spread target forward by one step to avoid data leakage.
An XGBoost model was implemented to estimate the next-period bid-ask spread. XGBoost (Extreme Gradient Boosting) is a form of decision tree-based machine learning, advantageous for its high accuracy, scalability, and built-in regularisation to prevent overfitting. While it is not traditionally suited for time series data, the incorporation of rolling windows and lagged features overcomes this limitation (XGBoosting.com (2023)). Z-score normalisation (StandardScaler) was applied to standardise numerical features with high variance. A chronological 80/20 train-test split was used to preserve order of the time series, sustaining consistent learning as well as a 5-fold cross validation. The model’s hyperparameter selection was administered using a grid search, in which 768 candidates were considered, resulting in parameters which provided the optimal configuration.
To assess performance, the model was compared against a naive baseline (previous spread as prediction), using metrics including MSE, MAE, RMSE, R², absolute error (AE), squared error (SE), and percentage error (PE). The mid-price, calculated as the average of the best bid and ask, was used as a simple estimator for the next-period mid-price.
The final quoting prices were generated using: \[\text{bid} = \text{mid-price} - \frac{\text{spread}}{2}\]\[\text{ask} = \text{mid-price} + \frac{\text{spread}}{2}\]
Evaluation Metrics
To assess the performance of our two main models—Volatility Forecasting and Volatility-Informed Quoting Strategy Model—we used complementary evaluation metrics tailored to each stage’s goals.
Volatility Forecasting Model (Pipeline 1)
The primary objective of the volatility forecasting model is to provide accurate short-term volatility predictions that serve as key inputs for the quoting model.
Metrics used include:
RMSE (Root Mean Squared Error): Measures the average magnitude of prediction errors. Smaller errors are critical for providing precise inputs to the quoting model.
QLIKE (Quasi-Likelihood Loss): Focuses on the accuracy of volatility forecasts relative to actual variance, which is important for financial volatility modeling. \[
\text{QLIKE} = \frac{1}{N} \sum_{i=1}^{N} \left( \frac{y_i^2}{\hat{y}_i^2} - \log \left( \frac{y_i^2}{\hat{y}_i^2} \right) - 1 \right)
\]
MSE (Mean Squared Error): Provides a standard measure of average squared prediction errors. \[
\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2
\]
Inference Time: Measures the computational efficiency for each prediction for high-frequency environments.
Among these, RMSE is considered the most critical metric because the quoting model depends on accurate volatility forecasts. Lower RMSE in Pipeline 1 leads to more precise bid-ask spread predictions in Pipeline 2, directly impacting quoting effectiveness.
Volatility-Informed Quoting Strategy Model (Pipeline 2)
To evaluate the quoting strategy model’s performance, we employed four microstructure-based metrics:
Hit Ratio: Measures how often our quotes are competitive enough to be executed. \[
\begin{aligned}
\text{Hit Ratio} &=
\frac{ \text{Number of competitive quotes} }{ \text{Total number of quotes} } \\\\
& \text{where bid} \geq \text{market bid and ask} \leq \text{market ask}
\end{aligned}
\]
Inside-Spread Quote Ratio: Assesses whether quotes are placed inside the market spread for better execution. \[
\begin{aligned}
\text{Inside-Spread Quote Ratio} &=
\frac{ \text{Number of quotes inside market spread} }{ \text{Total number of quotes} } \\\\
& \text{where bid} > \text{market bid and ask} < \text{market ask}
\end{aligned}
\]
Average Quote Effectiveness: Evaluates the average improvement of our quoted prices over the market reference. \[
\begin{aligned}
\text{Effectiveness} &=
\frac{1}{N} \sum_{i=1}^{N} \frac{1}{2} \bigl( (\text{Quoted Bid}_i - \text{Market Bid}_i) + (\text{Market Ask}_i - \text{Quoted Ask}_i) \bigr) \\\\
& \text{where } N \text{ is the total number of quotes}
\end{aligned}
\]
Sharpe Ratio of Quote Effectiveness: Measures the consistency and risk-adjusted performance of our quote placements. \[
\text{Sharpe Ratio} = \frac{\mathbb{E}[\text{Quote Effectiveness}]}{\text{Std}[\text{Quote Effectiveness}]}
\]
These metrics provide a comprehensive evaluation of how effectively the quoting model balances execution competitiveness, market efficiency, and consistency under varying market conditions.
Figure 2: Boxplot RMSE comparison across models. The bidirectional LSTM achieves the lowest mean RMSE with low variance, indicating strong predictive accuracy and stability.
Based on the RMSE robustness comparison across models, the bidirectional LSTM model was selected as the final volatility forecasting model. While other models may show slightly lower mean RMSE or narrower variance, the bidirectional LSTM demonstrates an optimal balance between these two aspects—achieving both consistently low prediction errors and limited variance across different trading periods. This balance is crucial for real-world deployment, as it ensures that the volatility forecasts remain accurate without significant fluctuations under varying market conditions. Such stability and reliability are essential for providing consistent and actionable insights to support quoting strategies in high-frequency trading environments.
Table 1: Average evaluation metrics for the volatility prediction models across the test data. The LSTM model outperforms WLS and GARCH in both RMSE and QLIKE, demonstrating more accurate and consistent volatility forecasts.
Our final LSTM model outperforms the baseline models across all key metrics. Specifically, it achieves an average RMSE that is ~20% lower than the WLS baseline and ~36% lower than the GARCH baseline. Similarly, the LSTM’s MSE is ~52% lower than WLS and ~68% lower than GARCH, while also exhibiting the lowest QLIKE value, indicating more accurate and consistent volatility predictions. Although LSTM has a longer inference time (0.058s vs. 0.00003s for WLS), its superior predictive accuracy justifies its use as the final volatility model in our quoting pipeline.
Figure 3: RMSE robustness of LSTM forecasts across stocks with varying correlation levels.
We tested the final LSTM volatility prediction model on three stock settings: in-sample (50200), a highly correlated stock (104919), and a low-correlation stock (22753). The correlation was determined by comparing mean log returns of each stock relative to the training stock (50200).
The in-sample stock shows the lowest RMSE and the smallest variance, confirming good in-sample performance. Surprisingly, the high-correlation stockexhibits higher RMSE than the in-sample case, but lower than the low-correlation stock, suggesting that even with strong historical correlation, prediction accuracy does not fully transfer. The low-correlation stock has the highest RMSE and largest variance, highlighting the model’s struggle to generalize across dissimilar assets. Overall, this illustrates that while historical price co-movement offers some guidance, it is not a reliable indicator of cross-stock predictive performance in volatility forecasting.
Figure 4. Quote effectiveness over time, shows how our quoted prices compare to market reference prices.
With a hit ratio of 45.98%, the model effectively balances aggressiveness and passivity, managing to get filled roughly half the time. This reflects a well-calibrated trade-off between execution probability and pricing precision.
The average quote effectiveness is near zero, and the Sharpe ratio (-0.0992) is close to flat, suggesting no exploitable inefficiencies in quoting placement. The model is aligned with market pricing but lacks predictive edge.
The line plot of quote effectiveness fluctuates around zero, suggesting stationarity. There is no discernible drift or trend in performance, implying that the quoting logic maintains a neutral stance across various market conditions. This consistency supports the idea that the model does not degrade over time and can be reliably deployed without frequent recalibration.
Discussion
Interpretation of Results
Volatility Forecasting: Effective but Conservative
As shown in the Results section, the LSTM model out performed traditional baseline models. This findings aligns with Zhang et al. (2022), who demonstrated that neural networks outperform linear regression and tree-based models in forecasting intra-day realized volatility due to their ability to capture complex latent interactions among variables. However, the LSTM still underreacts to sharp volatility spikes—reflecting its tendency to smooth predictions, a known limitation of standard LSTM models. This smoothing effect reduced responsiveness during high-risk windows, which could be critical in HFT settings. The introduction of the spike gate and fusion layer improved sensitivity, but future work could explore attention-based or transformer variants to better capture sudden changes.
Quoting Strategy: Realistic but Limited Predictive Edge
The quoting model’s near-50% hit ratio confirms adequate execution performance, yet its lack of consistent pricing advantage suggests that predicted volatility alone may not be sufficient for outperforming market benchmarks. This highlights the need to consider more nuanced factors such as dynamic price trends and market microstructure signals — a point further discussed in the Limitations and Future Work section.
Our tests on high-correlation stock and low-correlation stock revealed that correlation alone does not guarantee predictive transferability. Although the model performed better on the highly correlated stock, it still showed reduced accuracy compared to the in-sample setting. This challenges the assumption that strong co-movement directly translates to robust predictive performance. These findings suggest that while price-level similarities offer some guidance, differences in order book microstructure, liquidity, and trading behaviour can still significantly affect volatility prediction
Practical Relevance and Application
This study provides market makers and quantitative traders with short-term volatility forecasts to optimise bid-ask quoting strategies, balancing execution risk and market competitiveness in high-frequency trading environments. The final LSTM-based model’s predictions are implemented in an interactive dashboard, which visualises predicted vs actual volatility, evaluates cross-stock performance, and demonstrates how forecasts can be used to guide quoting decisions. Additional screenshots and functionality details are included in the Appendix B: Shiny App. The models and insights developed here are intended for market makers, traders, and quantitative strategists seeking to refine quoting strategies and manage risk in dynamic markets.
Limitations and Future Work
A key limitation of our volatility forecasting pipeline is the LSTM model’s tendency to smooth out sharp volatility spikes, causing delayed responses to sudden market changes. Our attempts to use an attention-based LSTM architecture actually resulted in even smoother predictions, suggesting that our feature engineering or loss functions may not have been well aligned with capturing such rapid spikes. Additionally, although our cross-stock analysis was conducted, the forecasting model itself did not incorporate explicit cross-stock features. This limits its generalisability to other assets despite strong historical correlations. Computational constraints and limited time also restricted us from training more complex, multi-stock models and expanding the temporal coverage of our dataset.
The quoting strategy model similarly assumes a static mid-price for the next interval, without explicitly modeling its dynamic evolution, and does not account for practical constraints like inventory management or risk appetite. These oversights can impact real-world quoting decisions, where such factors are critical.
Future work could address these limitations by exploring attention-based or time-aware architectures combined with refined feature engineering to better capture sudden volatility shifts. Incorporating explicit cross-stock features and training generalised multi-stock models would likely improve predictive stability across diverse assets. Finally, testing adaptive quoting algorithms beyond XGBoost and developing a dedicated mid-price prediction model could make quoting strategies more dynamic and responsive to market changes.
Conclusion
This study demonstrates that LSTM-based models significantly outperform traditional approaches in short-term volatility forecasting, reducing RMSE by 20% and 36% compared to WLS and GARCH, respectively. Incorporating these forecasts into an XGBoost-based quoting strategy improved bid-ask spread predictions, confirming the practical relevance of our two-pipeline approach for high-frequency trading environments. However, cross-stock generalisation remains a challenge, as historical correlation alone does not guarantee predictive transferability—highlighting the need for future models to explicitly incorporate shared microstructure features.
- Performed feature engineering - Built and tuned LSTM model for volatility forecasting - Integrated WLS, HAR-RV, GARCH models into the workflow - Developed evaluation pipeline - Designed presentation slides - Consolidated and debugged group code into unified, reproducible pipeline in final report
Chenghao Kang
540234745
- Literature review - Tested and improved XGBoost model for Pipeline 2 - Evaluated Pipeline 1 model and identified the most suitable for Pipeline 2 - Contributed to construction of Pipeline 2 - Final presentation - Report: Pipeline 2 section and supplemented other sections
Oscar Pham
530417214
- Literature review for Pipeline 2 - Tested LSTM and ARMA-GARCH for Pipeline 1 - Trained and tested models for Pipeline 2 - Developed naive quoting strategy - Researched evaluation metrics/techniques for quoting strategy - Final report: Pipeline 2 methods/evaluation/limitations, Discussion/Limitations
Jiayi Li
530109516
- Tested ARMA-GARCH and ARIMA models for Pipeline 1 - Literature review - Interactive dashboard (reformed using Shiny, Pipeline 2 tab) - Final presentation - Final report: interpretation and implications, summary of key findings, significance
Ella Jones
520434145
- Literature review - Initial XGBoost modelling test - Evaluation of Pipeline 1 through inter-stock correlation - Created Figure 1 Schematic Workflow (Presentation and Report) - Final Presentation: contribution to slides and script - Final Report: Method Pipeline 1 and thorough editing
Bollerslev, T., and M. Melvin. 1994. “Bid-Ask Spreads and Volatility in the Foreign Exchange Market: An Empirical Analysis.”Journal of International Economics 36 (3–4): 355–72. https://doi.org/10.1016/0022-1996(94)90008-6.
Nelson, D. M., A. C. M. Pereira, and R. A. de Oliveira. 2017. “Stock Market’s Price Movement Prediction with LSTM Neural Networks.” In International Joint Conference on Neural Networks (IJCNN). https://doi.org/10.1109/IJCNN.2017.7966019.
O’Hara, M. 1995. Market Microstructure Theory. Cambridge: Blackwell.
Zhang, C., Y. Zhang, M. Cucuringu, and Z. Qian. 2024. “Volatility Forecasting with Machine Learning and Intraday Commonality.”Journal of Financial Econometrics 22 (2): 492–530.
Zhang, Xuan, Xun Liang, Zhiyu Li, Shusen Zhang, Rui Xu, and Bo Wu. 2019. “AT-LSTM: An Attention-Based LSTM Model for Financial Time Series Prediction.” In IOP Conference Series: Materials Science and Engineering, 569:052037. https://doi.org/10.1088/1757-899X/569/5/052037.
Appendices
Appendix A: Feature Definitions
Intermediate Variables
Feature
Definition
Formula
Mid price
Average of best bid and best ask prices
\(\frac{\text{Bid Price} + \text{Ask Price}}{2}\)
Bid-ask spread
Difference between the lowest ask price and the highest bid price
Average raw spread over the current 330s rolling window
\(\text{Ask Price} - \text{Bid Price}\), averaged over the rolling window
Appendix B: Shiny App
Homepage
Volatility Forecasting
Quoting Strategies
Source Code
---title: "Precision Volatility Forecasting for Strategic Quote Placement in High-Frequency Trading"subtitle: "DATA3888 Data Science Capstone Project"author: "Optiver Stream, Group 22"format: html: code-tools: true code-fold: true fig_caption: yes embed-resources: true theme: flatly css: - https://use.fontawesome.com/releases/v5.0.6/css/all.css toc: true toc_depth: 4 toc_float: true margin-width: 350pxexecute: cache: true cache-path: _cache cache-depth: 0bibliography: ref.bibjupyter: python3---```{python}import osimport pandas as pdimport numpy as npimport importlibfrom pathlib import Pathimport src.util as utilimport src.rv as rvimport src.lstm as lstmimport src.garch as garchimport src.pipeline2 as p2_ = importlib.reload(util)_ = importlib.reload(rv)_ = importlib.reload(garch)_ = importlib.reload(lstm)_ = importlib.reload(p2)BUILD_MODEL =FalseRUN_EVALUATION =Falseos.makedirs('temp/insample', exist_ok=True)os.makedirs('temp/outsample', exist_ok=True)os.makedirs('temp/pipeline2', exist_ok=True)os.makedirs('models/lstm', exist_ok=True)```:::{.callout-note}This work can be found in this [GitHub repository](https://github.com/mofrui/volatility-analysis).:::# Executive SummaryThis report explores whether short-term volatility forecasts can improve quote placement for high-frequency trading (HFT) firms and how well single-stock models generalise across correlated assets. We developed a two-pipeline system: (1) a volatility forecasting model using LSTM, and (2) a quoting strategy model using XGBoost, informed by predicted volatility and order book features.The LSTM model outperformed traditional baselines (WLS, GARCH), reducing RMSE by up to 36% and QLIKE by over 60%, though it underreacted to volatility spikes. The quoting model achieved a ~46% hit ratio but offered limited pricing advantage.Our findings highlight the potential of volatility-informed quoting, while also revealing challenges in responsiveness, mid-price modelling, and generalisation. Final models are deployed in a dashboard to assist market makers with risk-aware quote placement.# Background## Problem Context and MotivationsMarket makers profit off the bid-ask spread, the discrepancy between the highest price a buyer is willing to pay and the lowest price a seller is willing to accept (@ohara1995).Volatility, which measures price fluctuations in financial markets, introduces both risk and opportunity to market makers (@hasbrouck2006).Low volatility indicates stable price movements, tighter bid–ask spreads, and profits made through high trade volumes.In contrast, during high volatility, price fluctuations heighten—creating uncertainty and risk, thus spreads widen to insure against potential losses. @bollerslev1994 stated that there is a strong positive relationship between volatility and spreads; increased volatility evidently widens bid–ask spreads, with highly statistically significant coefficients linking conditional variance to spread levels.For trading firms like Optiver, accurately forecasting short-term volatility is crucial for setting competitive quotes and managing execution risk, especially in HFT and options markets (@optiver2021).Motivated by this, our study focuses on leveraging predicted short-term volatility to optimise quoting strategies based on bid-ask spread.This study also considers the effects of inter-stock correlation on model performance by training a model on one stock and testing it on both a highly correlated and uncorrelated stock.The aim being to see if information about one stock can be used to improve and or make predictions about another.This context leads to the main research question of this study:> **How can short-term volatility forecasts be leveraged to optimise bid-ask spread quoting strategies—balancing execution risk and market competitiveness—for HFT firms? Additionally, how does inter-stock correlation influence prediction accuracy and quoting effectiveness when models are applied to unseen stocks?**## Prior Work and Relevance@nelson2017 demonstrated that Long Short-Term Memory (LSTM) networks can be effectively applied to financial time series forecasting, achieving an average accuracy of 55.9% in predicting short-term stock price movements.Recent work by @zhang2019 introduced the attention-enhanced AT-LSTM model, which significantly outperformed both traditional ARIMA and standard LSTM models in forecasting financial time series.The attention mechanism dynamically assigns weights to different time steps, helping the model focus on the most relevant historical information for improved prediction accuracy.Their results showed that AT-LSTM achieved the lowest Mean Absolute Percentage Error (MAPE) values across multiple indices (including the Russell 2000, DJIA, and Nasdaq), consistently outperforming ARIMA.These studies highlight the suitability of LSTM-based approaches for high-frequency volatility forecasting in our context.## Dataset```{python}DATA_FOLDER ="data"FEATURE_FILE ="order_book_feature.parquet"TARGET_FILE ="order_book_target.parquet"# Primary stock ID for model trainingMODEL_STOCK_ID =50200# Number of time_ids to use for trainingMODEL_TIMEID_COUNT =50# Other stocks for cross-stock performance comparisonCROSS_STOCK_IDS = [22753, 104919]# Number of time_ids per stock for comparisonCROSS_TIMEID_COUNT =10feature_path = os.path.join(DATA_FOLDER, FEATURE_FILE)target_path = os.path.join(DATA_FOLDER, TARGET_FILE)df_features = pd.read_parquet(feature_path, engine="pyarrow")df_target = pd.read_parquet(target_path, engine="pyarrow")# Concatenate feature and target, then sortdf_all = ( pd.concat([df_features, df_target], axis=0) .sort_values(by=["stock_id", "time_id", "seconds_in_bucket"]) .reset_index(drop=True))# Prepare main-stock training datasetdf_main_raw = df_all[df_all["stock_id"] == MODEL_STOCK_ID].copy()main_time_ids = df_main_raw["time_id"].unique()[:MODEL_TIMEID_COUNT]# df_main_train: training feature set for the primary stock (50 time_ids)df_main_train = ( df_main_raw[df_main_raw["time_id"].isin(main_time_ids)] .pipe(util.create_snapshot_features) .reset_index(drop=True))unique_time_ids = df_main_raw["time_id"].unique()test_time_ids = unique_time_ids[MODEL_TIMEID_COUNT : MODEL_TIMEID_COUNT +10]# df_main_test: test feature set for the primary stock (next 10 time_ids)df_main_test = ( df_main_raw[df_main_raw["time_id"].isin(test_time_ids)] .pipe(util.create_snapshot_features) .reset_index(drop=True))# Prepare cross-stock comparison datasetsdf_cross_features = {}for stock_id in CROSS_STOCK_IDS: df_stock_raw = df_all[df_all["stock_id"] == stock_id].copy() time_ids_cross = df_stock_raw["time_id"].unique()[:CROSS_TIMEID_COUNT] df_stock_feat = ( df_stock_raw[df_stock_raw["time_id"].isin(time_ids_cross)] .pipe(util.create_snapshot_features) .reset_index(drop=True) )# df_cross_features: dict of feature sets for each comparison stock (10 time_ids) df_cross_features[stock_id] = df_stock_feat```This study uses the Optiver Additional Dataset, which contains sequential ultra-high-frequency limit order book (LOB) snapshots for multiple stocks, structured into hourly trading windows.Specifically, `order_book_feature.parquet` includes 17.6 million rows from the first 30 minutes of each trading hour, and `order_book_target.parquet` includes 17.9 million rows from the last 30 minutes.Each row is indexed by `stock_id`, `time_id`, and `seconds_in_bucket` (0–3599), together defining a specific stock-hour snapshot.The feature and target datasets were concatenated and sorted by `stock_id`, `time_id`, and `seconds_in_bucket` to reconstruct complete 1-hour trading periods.For modelling, we focus on a single primary stock (`stock_id` = 50200) for training and testing, and two additional stocks (`stock_id` = 22753 and 104919) for cross-stock generalisation analysis.# MethodologyOur overall methodology consists of two main pipelines: the first focuses on forecasting short-term volatility, and the second uses these forecasts to inform quoting strategies (see Figure 1).```{python}#| fig-cap: "Figure 1: Schematic workflow overview."from IPython.display import ImageImage(filename='resources/figure1.png')```## Feature Engineering and Data PreparationFeature engineering was applied to the reconstructed dataset to generate meaningful variables capturing market dynamics.For the volatility forecasting pipeline, engineered features include:- Mid price: average of bid and ask prices- Bid-ask spread: difference between the lowest ask price and the highest bid price- Weighted average price- Spread percentage- Order book imbalance- Depth ratio- Log return and log WAP change- Rolling standard deviation of log returns- Spread z-score- Volume imbalanceFor the volatility-informed quoting strategy pipeline, the key input feature is the predicted short-term volatility (`predicted_volatility_lead1`) from the final LSTM model, combined with order book-based features.Detailed feature definitions and formulas are provided in the Appendix A: Feature Definitions.To capture short-term volatility dynamics, we used a rolling window approach, segmenting the high-frequency order book data into overlapping samples.We evaluated different configurations—window size (W), forecast horizon (H), and step size (S)—using a Random Forest baseline to identify the best balance between prediction accuracy (MSE) and sample richness.Results showed a U-shaped error curve, with overly large or small windows hurting performance. We selected W=330s, H=10s, and S=5s as the optimal configuration for all model development.## Volatility Forecasting Models```{python}feature_cols = ["wap", "spread_pct", "imbalance", "depth_ratio", "log_return","log_wap_change", "rolling_std_logret", "spread_zscore", "volume_imbalance"]if BUILD_MODEL: _, wls_val_df = rv.wls(df_main_train) wls_val_df.to_csv('temp/insample/wls_val_df.csv') garch_val_df = garch.garch(df_main_train) garch_val_df.to_csv('temp/insample/garch_val_df.csv') _, baseline_val_df = lstm.baseline(df_main_train, epochs=50) baseline_val_df.to_csv('temp/insample/baseline_val_df.csv') _, moe_val_df = lstm.moe(df_main_train, feature_cols, epochs=50) moe_val_df.to_csv('temp/insample/moe_val_df.csv') _, _, moe_staged_val_df = lstm.moe_staged(df_main_train, feature_cols, epochs=50) moe_staged_val_df.to_csv('temp/insample/moe_staged_val_df.csv')```The first pipeline focuses on generating accurate short-term volatility predictions, a crucial capability for HFT firms to optimise options pricing and quoting strategies.To this end, we developed a robust model to capture the complex volatility behaviour of the selected stock.Our models were trained using sequential order book data enhanced with engineered features.Due to computational constraints, we initially selected 50 consecutive `time_ids` from stock 50200 as the training set for testing—using an 80/20 chronological train-test split to maintain temporal consistency.We trialled three initial candidates for volatility prediction: HAV-RV (with weighted least squares as a linear baseline), GARCH, and Long Short-Term Memory (LSTM) networks.The optimal GARCH hyperparameters (`p=1`, `q=2`) were determined by grid search, minimising RMSE.LSTM networks, though less interpretable due to their gated architecture, demonstrated superior capability in capturing long-term dependencies and nonlinear volatility patterns (@LSTM2023advantages).We iteratively tuned the LSTM model’s architecture and hyperparameters to achieve the best trade-off between complexity and performance.The final volatility forecasting model is a double-layer bidirectional LSTM architecture integrating a Mixture-of-Experts (MoE) approach.It consists of two stacked bidirectional LSTM layers (128 hidden units each) and a dense fusion layer (64 ReLU units).The MoE head features two expert outputs—one capturing normal market behaviour and another focused on spikes—combined via a learned spike probability gating mechanism.To effectively balance precision and stability during training, we employed a two-stage loss strategy: an initial weighted MSE loss focusing on volatility magnitude, followed by a combined log-cosh and focal BCE loss to refine spike detection and robustness.This dual-headed design allows the model to dynamically adapt to different volatility regimes, enhancing both accuracy and interpretability.To prevent data leakage, we ensured that the 80/20 train-test split strictly adhered to non-overlapping `time_ids`.Additional safeguards included separate data generators for training and validation, feature normalisation within each rolling window, and clipping outlier values to stabilise predictions.Model selection was based on RMSE performance, with further discussion and detailed evaluation metrics provided in the following sections.This thorough approach ensures that the volatility forecasts feeding into our quoting strategy are accurate, robust, and well-suited to real-world trading environments.## Volatility-Informed Quoting Strategy Model```{python}# prepare lstm prediction from pipeline 1cache_dir = Path("temp/pipeline2")cache_dir.mkdir(parents=True, exist_ok=True)cache_file = cache_dir /"predictions_spy.csv"if cache_file.is_file(): pred_df = pd.read_csv(cache_file)else: basic_features = ["wap", "spread_pct", "imbalance", "depth_ratio","log_return", "log_wap_change", "rolling_std_logret","spread_zscore", "volume_imbalance" ] val_df = util.out_of_sample_evaluation( model_path, scaler_path, df_main_train, basic_features ) pred_df = val_df.rename(columns={"y_pred": "predicted_volatility_lead1"}) pred_df.to_csv(cache_file, index=False)best_model, eval_metrics = p2.train_bid_ask_spread_model( df_main_train, pred_df, cache_dir="models/pipeline2", model_save_path="models/pipeline2/bid_ask_spread_model.pkl")result = p2.generate_quote( pred_df, df_main_train, spread_model_path="models/pipeline2/bid_ask_spread_model.pkl", stock_id=50200)```This quoting strategy model uses the predicted volatility from the Volatility Forecasting model and current order book signals to generate quoting strategies that adapt to market conditions.This enhances interpretability and helps market makers adjust bid-ask spreads for high-frequency trading (HFT).Key features include the predicted short-term volatility (`predicted_volatility_lead1`) and engineered order book features: `spread_pct_scaled`, `wap`, `imbalance`, `depth_ratio`, `log_return`, and `bid_ask_spread`.A rolling window approach with a 330-bucket segment and a 10-bucket stride was used, shifting the bid-ask spread target forward by one step to avoid data leakage.An XGBoost model was implemented to estimate the next-period bid-ask spread.XGBoost (Extreme Gradient Boosting) is a form of decision tree-based machine learning, advantageous for its high accuracy, scalability, and built-in regularisation to prevent overfitting.While it is not traditionally suited for time series data, the incorporation of rolling windows and lagged features overcomes this limitation (@xgboosting2023adv).Z-score normalisation (StandardScaler) was applied to standardise numerical features with high variance.A chronological 80/20 train-test split was used to preserve order of the time series, sustaining consistent learning as well as a 5-fold cross validation.The model’s hyperparameter selection was administered using a grid search, in which 768 candidates were considered, resulting in parameters which provided the optimal configuration.To assess performance, the model was compared against a naive baseline (previous spread as prediction), using metrics including MSE, MAE, RMSE, R², absolute error (AE), squared error (SE), and percentage error (PE).The mid-price, calculated as the average of the best bid and ask, was used as a simple estimator for the next-period mid-price.The final quoting prices were generated using:$$\text{bid} = \text{mid-price} - \frac{\text{spread}}{2}$$$$\text{ask} = \text{mid-price} + \frac{\text{spread}}{2}$$## Evaluation MetricsTo assess the performance of our two main models—Volatility Forecasting and Volatility-Informed Quoting Strategy Model—we used complementary evaluation metrics tailored to each stage’s goals.### Volatility Forecasting Model (Pipeline 1)The primary objective of the volatility forecasting model is to provide accurate short-term volatility predictions that serve as key inputs for the quoting model.Metrics used include:- **RMSE (Root Mean Squared Error)**: Measures the average magnitude of prediction errors. Smaller errors are critical for providing precise inputs to the quoting model.$$\text{RMSE} = \sqrt{ \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2 }$$- **QLIKE (Quasi-Likelihood Loss)**: Focuses on the accuracy of volatility forecasts relative to actual variance, which is important for financial volatility modeling.$$\text{QLIKE} = \frac{1}{N} \sum_{i=1}^{N} \left( \frac{y_i^2}{\hat{y}_i^2} - \log \left( \frac{y_i^2}{\hat{y}_i^2} \right) - 1 \right)$$- **MSE (Mean Squared Error)**: Provides a standard measure of average squared prediction errors.$$\text{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2$$- **Inference Time**: Measures the computational efficiency for each prediction for high-frequency environments.Among these, RMSE is considered the most critical metric because the quoting model depends on accurate volatility forecasts. Lower RMSE in Pipeline 1 leads to more precise bid-ask spread predictions in Pipeline 2, directly impacting quoting effectiveness.### Volatility-Informed Quoting Strategy Model (Pipeline 2)To evaluate the quoting strategy model’s performance, we employed four microstructure-based metrics:- **Hit Ratio**: Measures how often our quotes are competitive enough to be executed.$$\begin{aligned}\text{Hit Ratio} &=\frac{ \text{Number of competitive quotes} }{ \text{Total number of quotes} } \\\\& \text{where bid} \geq \text{market bid and ask} \leq \text{market ask}\end{aligned}$$- **Inside-Spread Quote Ratio**: Assesses whether quotes are placed inside the market spread for better execution.$$\begin{aligned}\text{Inside-Spread Quote Ratio} &=\frac{ \text{Number of quotes inside market spread} }{ \text{Total number of quotes} } \\\\& \text{where bid} > \text{market bid and ask} < \text{market ask}\end{aligned}$$- **Average Quote Effectiveness**: Evaluates the average improvement of our quoted prices over the market reference.$$\begin{aligned}\text{Effectiveness} &=\frac{1}{N} \sum_{i=1}^{N} \frac{1}{2} \bigl( (\text{Quoted Bid}_i - \text{Market Bid}_i) + (\text{Market Ask}_i - \text{Quoted Ask}_i) \bigr) \\\\& \text{where } N \text{ is the total number of quotes}\end{aligned}$$- **Sharpe Ratio of Quote Effectiveness**: Measures the consistency and risk-adjusted performance of our quote placements.$$\text{Sharpe Ratio} = \frac{\mathbb{E}[\text{Quote Effectiveness}]}{\text{Std}[\text{Quote Effectiveness}]}$$These metrics provide a comprehensive evaluation of how effectively the quoting model balances execution competitiveness, market efficiency, and consistency under varying market conditions.# Result## Model Performance Comparison```{python}#| fig-cap: "Figure 2: Boxplot RMSE comparison across models. The bidirectional LSTM achieves the lowest mean RMSE with low variance, indicating strong predictive accuracy and stability."wls_val_df = pd.read_csv('temp/insample/wls_val_df.csv')garch_val_df = pd.read_csv('temp/insample/garch_val_df.csv')baseline_val_df = pd.read_csv('temp/insample/baseline_val_df.csv')moe_val_df = pd.read_csv('temp/insample/moe_val_df.csv')bilstm_val_df = pd.read_csv('temp/insample/moe_staged_val_df.csv')val_dfs = {'wls_baseline': wls_val_df,'garch_baseline': garch_val_df,'lstm_baseline': baseline_val_df,'moe_lstm': moe_val_df,'bidirectional_lstm': bilstm_val_df}util.plot_rmse_robustness(val_dfs)```Based on the RMSE robustness comparison across models, the bidirectional LSTM model was selected as the final volatility forecasting model.While other models may show slightly lower mean RMSE or narrower variance, the bidirectional LSTM demonstrates an optimal balance between these two aspects—achieving both consistently low prediction errors and limited variance across different trading periods.This balance is crucial for real-world deployment, as it ensures that the volatility forecasts remain accurate without significant fluctuations under varying market conditions.Such stability and reliability are essential for providing consistent and actionable insights to support quoting strategies in high-frequency trading environments.```{python}#| fig-cap: "Table 1: Average evaluation metrics for the volatility prediction models across the test data. The LSTM model outperforms WLS and GARCH in both RMSE and QLIKE, demonstrating more accurate and consistent volatility forecasts."val_dfs_metrics = {'WLS': wls_val_df,'GARCH': garch_val_df,'LSTM': bilstm_val_df}out = util.create_evaluation_metrics_table(val_dfs_metrics)display(out)```Our final LSTM model outperforms the baseline models across all key metrics.Specifically, it achieves an average RMSE that is ~20% lower than the WLS baseline and ~36% lower than the GARCH baseline.Similarly, the LSTM’s MSE is ~52% lower than WLS and ~68% lower than GARCH, while also exhibiting the lowest QLIKE value, indicating more accurate and consistent volatility predictions.Although LSTM has a longer inference time (0.058s vs. 0.00003s for WLS), its superior predictive accuracy justifies its use as the final volatility model in our quoting pipeline.## Cross-Stock Generalisation Analysis```{python}#| fig-cap: "Figure 3: RMSE robustness of LSTM forecasts across stocks with varying correlation levels."model_path ="models/lstm/moe_staged.h5"scaler_path ="models/lstm/moe_staged_scalers.pkl"feature_cols = ["wap", "spread_pct", "imbalance", "depth_ratio","log_return", "log_wap_change","rolling_std_logret", "spread_zscore", "volume_imbalance"]val_dfs_cross = {}cache_dir ='temp/outsample'for stock_id, df_feat in df_cross_features.items(): cache_file =f'{cache_dir}/{stock_id}.csv'if RUN_EVALUATION ornot os.path.isfile(cache_file): val_df = util.out_of_sample_evaluation(model_path, scaler_path, df_feat, feature_cols) val_df.to_csv(cache_file, index=False)else: val_df = pd.read_csv(cache_file) val_dfs_cross[stock_id] = val_dfin_sample_df = pd.read_csv('temp/insample/moe_staged_val_df.csv')val_dfs_for_plot = {"In Sample": in_sample_df,"High Correlation Stock": val_dfs_cross[104919],"Low Correlation Stock": val_dfs_cross[22753],}util.plot_rmse_robustness(val_dfs_for_plot)```We tested the final LSTM volatility prediction model on three stock settings: in-sample (50200), a highly correlated stock (104919), and a low-correlation stock (22753).The correlation was determined by comparing mean log returns of each stock relative to the training stock (50200).The in-sample stock shows the lowest RMSE and the smallest variance, confirming good in-sample performance.Surprisingly, the high-correlation stockexhibits higher RMSE than the in-sample case, but lower than the low-correlation stock, suggesting that even with strong historical correlation, prediction accuracy does not fully transfer.The low-correlation stock has the highest RMSE and largest variance, highlighting the model’s struggle to generalize across dissimilar assets.Overall, this illustrates that while historical price co-movement offers some guidance, it is not a reliable indicator of cross-stock predictive performance in volatility forecasting.## Quoting Strategies (bid-ask spread prediction) Effectiveness```{python}#| fig-cap: "Figure 4. Quote effectiveness over time, shows how our quoted prices compare to market reference prices."cache_dir = Path("temp/pipeline2")cache_dir.mkdir(parents=True, exist_ok=True)cache_file_test = cache_dir /"predictions_spy_test.csv"if cache_file_test.is_file(): val_df_test = pd.read_csv(cache_file_test)else: basic_features = ["wap", "spread_pct", "imbalance", "depth_ratio","log_return", "log_wap_change", "rolling_std_logret","spread_zscore", "volume_imbalance" ] val_df_test = util.out_of_sample_evaluation( model_path, scaler_path, df_main_test, basic_features ) val_df_test = val_df_test.rename(columns={"y_pred": "predicted_volatility_lead1"}) val_df_test.to_csv(cache_file_test, index=False)p2_metrics = p2.evaluate_quote_strategy( val_df_test, df_main_test, spread_model_path="models/pipeline2/bid_ask_spread_model.pkl")```With a hit ratio of 45.98%, the model effectively balances aggressiveness and passivity, managing to get filled roughly half the time.This reflects a well-calibrated trade-off between execution probability and pricing precision.The average quote effectiveness is near zero, and the Sharpe ratio (-0.0992) is close to flat, suggesting no exploitable inefficiencies in quoting placement.The model is aligned with market pricing but lacks predictive edge.The line plot of quote effectiveness fluctuates around zero, suggesting stationarity.There is no discernible drift or trend in performance, implying that the quoting logic maintains a neutral stance across various market conditions.This consistency supports the idea that the model does not degrade over time and can be reliably deployed without frequent recalibration.# Discussion## Interpretation of Results### Volatility Forecasting: Effective but ConservativeAs shown in the Results section, the LSTM model out performed traditional baseline models.This findings aligns with Zhang et al. (2022), who demonstrated that neural networks outperform linear regression and tree-based models in forecasting intra-day realized volatility due to their ability to capture complex latent interactions among variables.However, the LSTM still underreacts to sharp volatility spikes—reflecting its tendency to smooth predictions, a known limitation of standard LSTM models.This smoothing effect reduced responsiveness during high-risk windows, which could be critical in HFT settings.The introduction of the spike gate and fusion layer improved sensitivity, but future work could explore attention-based or transformer variants to better capture sudden changes.### Quoting Strategy: Realistic but Limited Predictive EdgeThe quoting model’s near-50% hit ratio confirms adequate execution performance, yet its lack of consistent pricing advantage suggests that predicted volatility alone may not be sufficient for outperforming market benchmarks.This highlights the need to consider more nuanced factors such as dynamic price trends and market microstructure signals — a point further discussed in the Limitations and Future Work section.### Cross-Stock Generalisation: Correlation ≠ TransferabilityOur tests on high-correlation stock and low-correlation stock revealed that correlation alone does not guarantee predictive transferability.Although the model performed better on the highly correlated stock, it still showed reduced accuracy compared to the in-sample setting.This challenges the assumption that strong co-movement directly translates to robust predictive performance.These findings suggest that while price-level similarities offer some guidance, differences in order book microstructure, liquidity, and trading behaviour can still significantly affect volatility prediction## Practical Relevance and ApplicationThis study provides market makers and quantitative traders with short-term volatility forecasts to optimise bid-ask quoting strategies, balancing execution risk and market competitiveness in high-frequency trading environments.The final LSTM-based model’s predictions are implemented in an interactive dashboard, which visualises predicted vs actual volatility, evaluates cross-stock performance, and demonstrates how forecasts can be used to guide quoting decisions.Additional screenshots and functionality details are included in the Appendix B: Shiny App.The models and insights developed here are intended for market makers, traders, and quantitative strategists seeking to refine quoting strategies and manage risk in dynamic markets.## Limitations and Future WorkA key limitation of our volatility forecasting pipeline is the LSTM model’s tendency to smooth out sharp volatility spikes, causing delayed responses to sudden market changes.Our attempts to use an attention-based LSTM architecture actually resulted in even smoother predictions, suggesting that our feature engineering or loss functions may not have been well aligned with capturing such rapid spikes.Additionally, although our cross-stock analysis was conducted, the forecasting model itself did not incorporate explicit cross-stock features.This limits its generalisability to other assets despite strong historical correlations.Computational constraints and limited time also restricted us from training more complex, multi-stock models and expanding the temporal coverage of our dataset.The quoting strategy model similarly assumes a static mid-price for the next interval, without explicitly modeling its dynamic evolution, and does not account for practical constraints like inventory management or risk appetite.These oversights can impact real-world quoting decisions, where such factors are critical.Future work could address these limitations by exploring attention-based or time-aware architectures combined with refined feature engineering to better capture sudden volatility shifts.Incorporating explicit cross-stock features and training generalised multi-stock models would likely improve predictive stability across diverse assets.Finally, testing adaptive quoting algorithms beyond XGBoost and developing a dedicated mid-price prediction model could make quoting strategies more dynamic and responsive to market changes.# ConclusionThis study demonstrates that LSTM-based models significantly outperform traditional approaches in short-term volatility forecasting, reducing RMSE by 20% and 36% compared to WLS and GARCH, respectively.Incorporating these forecasts into an XGBoost-based quoting strategy improved bid-ask spread predictions, confirming the practical relevance of our two-pipeline approach for high-frequency trading environments.However, cross-stock generalisation remains a challenge, as historical correlation alone does not guarantee predictive transferability—highlighting the need for future models to explicitly incorporate shared microstructure features.# Student Contributions| Name | Student ID | Contributions ||-----------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|| Daisy Lim | 520204962 | - Research, literature review <br> - Initial baseline model: HAV-RV OLS & WLS <br> - Interactive dashboard (About page, Volatility Prediction tab, integrated Quoting Strategies tab) <br> - Requirements/installation script <br> - Final presentation & contributed to slides <br> - Report: Executive Summary, Background, Discussion (limitations & improvements), Conclusion (future work) <br> - Contributed to README || Junrui Kang | 530531740 | - Performed feature engineering <br> - Built and tuned LSTM model for volatility forecasting <br> - Integrated WLS, HAR-RV, GARCH models into the workflow <br> - Developed evaluation pipeline <br> - Designed presentation slides <br> - Consolidated and debugged group code into unified, reproducible pipeline in final report || Chenghao Kang | 540234745 | - Literature review <br> - Tested and improved XGBoost model for Pipeline 2 <br> - Evaluated Pipeline 1 model and identified the most suitable for Pipeline 2 <br> - Contributed to construction of Pipeline 2 <br> - Final presentation <br> - Report: Pipeline 2 section and supplemented other sections || Oscar Pham | 530417214 | - Literature review for Pipeline 2 <br> - Tested LSTM and ARMA-GARCH for Pipeline 1 <br> - Trained and tested models for Pipeline 2 <br> - Developed naive quoting strategy <br> - Researched evaluation metrics/techniques for quoting strategy <br> - Final report: Pipeline 2 methods/evaluation/limitations, Discussion/Limitations || Jiayi Li | 530109516 | - Tested ARMA-GARCH and ARIMA models for Pipeline 1 <br> - Literature review <br> - Interactive dashboard (reformed using Shiny, Pipeline 2 tab) <br> - Final presentation <br> - Final report: interpretation and implications, summary of key findings, significance || Ella Jones | 520434145 | - Literature review <br> - Initial XGBoost modelling test <br> - Evaluation of Pipeline 1 through inter-stock correlation <br> - Created Figure 1 Schematic Workflow (Presentation and Report) <br> - Final Presentation: contribution to slides and script <br> - Final Report: Method Pipeline 1 and thorough editing |# ReferencesThis document includes references for further reading [@zhang2024; @wiese2020; @havrv2024overview].::: {#refs}:::# Appendices## Appendix A: Feature Definitions### Intermediate Variables| Feature | Definition | Formula ||-----------------|------------------------------------------------------------------|-------------------------------------------------------|| Mid price | Average of best bid and best ask prices | $\frac{\text{Bid Price} + \text{Ask Price}}{2}$ || Bid-ask spread | Difference between the lowest ask price and the highest bid price | $\text{Ask Price} - \text{Bid Price}$ |### Pipeline 1: Volatility Forecasting| Feature | Definition | Formula ||------------------------------------------|--------------------------------------------------|-----------------------------------------------------------------------------------------------------------|| Weighted average price (WAP) | Weighted average price of bid and ask | $\frac{(\text{Bid Price} \times \text{Ask Size}) + (\text{Ask Price} \times \text{Bid Size})}{\text{Bid Size} + \text{Ask Size}}$ || Spread percentage (spread\_pct) | Spread as a percentage of the mid price | $\frac{\text{Ask Price} - \text{Bid Price}}{\text{Mid Price}}$ || Order book imbalance (imbalance) | Snapshot-based imbalance between bid and ask | $\frac{\text{Bid Size} - \text{Ask Size}}{\text{Bid Size} + \text{Ask Size}}$ || Depth ratio | Market depth ratio of bid to ask size | $\frac{\text{Bid Size}}{\text{Ask Size}}$ || Log return | Log return of WAP between snapshots | $\log\left(\frac{\text{WAP}_t}{\text{WAP}_{t-1}}\right)$ || Log WAP change (log\_wap\_change) | Difference in log WAP values | $\log(\text{WAP}_t) - \log(\text{WAP}_{t-1})$ || Rolling standard deviation of log returns | Short-term volatility of log returns | $\text{std}(\log \text{ return }_{t-k} \dots \log \text{ return }_t)$ || Spread z-score (spread\_zscore) | Z-score of spread percentage within a rolling window | $\frac{\text{Spread}_t - \mu_{\text{Spread}}}{\sigma_{\text{Spread}}}$ || Volume imbalance | Difference between ask and bid sizes | $\text{Bid Size} - \text{Ask Size}$ |### Pipeline 2: Volatility-Informed Quoting Strategies| Feature | Definition | Formula ||----------------------------------|-------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|| Predicted short-term volatility | Predicted short-term volatility from Pipeline 1 (LSTM output) | From LSTM model; used as a key input || Weighted average price (WAP) | Weighted average price of bid and ask | $\frac{(\text{Bid Price} \times \text{Ask Size}) + (\text{Ask Price} \times \text{Bid Size})}{\text{Bid Size} + \text{Ask Size}}$ || Standardised spread percentage | Z-score scaled spread percentage | $\frac{\text{Ask Price} - \text{Bid Price}}{\text{Mid Price}}$, then Z-score scaled || Order book imbalance (imbalance)| Snapshot-based imbalance between bid and ask | $\frac{\text{Bid Size} - \text{Ask Size}}{\text{Bid Size} + \text{Ask Size}}$ || Depth ratio | Market depth ratio of bid to ask size | $\frac{\text{Bid Size}}{\text{Ask Size}}$ || Log return | Log return of WAP between snapshots | $\log \left( \frac{\text{WAP}_t}{\text{WAP}_{t-1}} \right)$ || Average bid-ask spread | Average raw spread over the current 330s rolling window | $\text{Ask Price} - \text{Bid Price}$, averaged over the rolling window |## Appendix B: Shiny App